Whole-genome sequence and genesis of an avian influenza virus H5N1 isolated from a healthy chicken in a live bird market in Indonesia: accumulation of mammalian adaptation markers in avian hosts

Background Influenza A viruses are a major pathogen that causes significant clinical and economic harm to many animals. In Indonesia, the highly pathogenic avian influenza (HPAI) H5N1 virus has been endemic in poultry since 2003 and has caused sporadic deadly infections in humans. The genetic bases that determine host range have not yet been fully elucidated. We analyzed the whole-genome sequence of a recent H5 isolate to reveal the evolution toward its mammalian adaptation. Methods We determined the whole-genome sequence of A/chicken/East Java/Av1955/2022 (hereafter, “Av1955”) from a healthy chicken in April 2022 and conducted phylogenetic and mutational analysis. Results Phylogenetic analysis revealed that Av1955 belonged to the H5N1 clade 2.3.2.1c (Eurasian lineage). The six gene segments (PB1, PB2, HA, NP, NA, and NS) out of the eight segments derived from viruses of H5N1 Eurasian lineage, one (PB2) from the H3N6 subtype and the remaining one (M) from the H5N1 clade 2.1.3.2b (Indonesian lineage). The donor of the PB2 segment was a reassortant among three viruses of H5N1 Eurasian and Indonesian lineages and the H3N6 subtype. The HA amino acid sequence contained multiple basic amino acids at the cleavage site. Mutation analysis revealed that Av1955 possessed the maximal number of mammalian adaptation marker mutations. Conclusions Av1955 was a virus of H5N1 Eurasian lineage. The HA protein contains an HPAI H5N1-type cleavage site sequence, while the virus was isolated from a healthy chicken suggesting its low pathogenicity nature. The virus has increased mammalian adaptation markers by mutation and intra- and inter-subtype reassortment, gathering gene segments possessing the most abundant maker mutations among previously circulating viruses. The increasing mammalian adaptation mutation in avian hosts suggests that they might be adaptive to infection in mammalian and avian hosts. It highlights the importance of genomic surveillance and adequate control measures for H5N1 infection in live poultry markets.


INTRODUCTION
Influenza A viruses are a major pathogen that causes significant clinical and economic harm to various species, including poultry, pigs, horses, marine mammals, and humans (Peiris, de Jong & Guan, 2007;Webster et al., 1992). The surface antigenicity of the virus particles divides them into 18 hemagglutinin (HA) (H1 to H18) and 11 neuraminidase (NA) (N1 to N11) subtypes (Tong et al., 2013). Highly pathogenic avian influenza (HPAI) H5N1 viruses are capable of sporadic human infection. They have the potential to cause severe sickness with a high case fatality rate among people who have been hospitalized and proven to have the virus. To date, there have been 861 confirmed disease cases in people, and 455 have led to death (WHO H5N1, 2021). Indonesia reported 200 confirmed human cases and 168 fatalities. The 168 fatalities account for over one-third of all deaths among all affected countries. However, Indonesia had only two cases in , zero in 2016, and one in 2017, the last (World Health Organization, 2015World Health Organization, 2019;WHO H5N1, 2021).
The HPAI H5N1 subtype was first discovered in Indonesia in 2003 (Li et al., 2004), spreading to numerous regions and killing over 16 million chickens by the end of 2007 (Lam et al., 2008;Putri, Widyarini & Asmara, 2019 (Dharmayanti et al., 2014). The clade became prevalent as early as 2013. Shimizu et al. (2016) reported the isolations of three avian influenza viruses in East Java, Indonesia: H5N1 clade 2.3.2.1c (Av154) from an HPAI outbreak in a turkey farm in 2013, H5N1 clade 2.1.3.2b (Av240) from an ill chicken at a live poultry market in 2014, and H3N6 (Av39) from a mildly ill duck at a live poultry market in 2013 (Shimizu et al., 2016 Mertens et al. (2013) identified 149 phenotypic marker mutations related to host tropism or increased pathogenicity in mammals from previously reported literature. Recently, Rehman et al. (2022) reported detections of avian influenza A/H5 viruses in poultry at live bird markets in East Java, Indonesia. In this study, we determined the whole-genome sequence of one of the detected viruses to identify the clade/lineage and some characteristic features of the amino acid sequences of the viral proteins. We compared the phenotypic marker mutations with its ancestral viruses to reveal the evolution toward its mammalian adaptation.

Ethical approval
The Animal Care and Use Committee at the Faculty of Veterinary Medicine, Universitas Airlangga, Surabaya, Indonesia gave their approval for every step of this study (approval no. 1.KE.028.03.2021). Rehman et al. (2022) conducted a cross-sectional study from March 2021 to April 2022, collecting 600 tracheal and cloaca swab samples from live bird markets in East Java, Indonesia. The authors isolated an avian influenza A/H5 virus from a tracheal swab sample collected on 5 March 2022 from a healthy chicken, which showed no symptoms of lethargy, shortened neck, retracted feather, diarrhea, torticollis, and dyspnea. For this research, they provided a one-mL aliquot of an allantoic fluid harvested from the A/H5 virus-infected egg. The harvest was positive for the hemagglutination test with chicken red blood cells. It was also positive for one-step TaqMan real-time RT-PCR tests targeting the influenza A virus M gene and H5 HA gene using sets of primers and probes as previously described (Shimizu et al., 2016) (data, not shown). All procedures were performed in the BSL3 laboratory of the Institute of Tropical Disease, Airlangga University. We named the virus A/chicken/East Java/Av1955/2022 (hereafter, "Av1955").

Whole-genome sequencing
We performed genome analysis by next-generation sequencing as previously described (Novianti et al., 2019). Total RNA was isolated from the allantoic harvest using a QIAamp viral Minikit (Qiagen, Hilden, Germany). Linear polyacrylamide was used as a transporter rather than tRNA. A TruSeq RNA sample preparation kit version 2 was utilized to compile an RNA library (Illumina, San Diego, CA, USA). The library was loaded in the flow cell of the 300-cycle MiSeq reagent kit version 2 (Illumina, San Diego, CA, USA). A MiSeq system (Illumina, San Diego, CA, USA) sequenced the barcode-labeled multiplex library, which had two runs of 150 bp each. The system created the FASTQ files of sequence reads removing the primer and adaptor sequences. For analysis, the files were imported into CLC Genomics Workbench version 8.1 (CLC bio, Osaka, Japan). The reads were mapped to the genomes of 27 reference viruses of influenza type A virus, including all subtypes of HA (H1 to H18) and NA (N1 to N11). Tentative complete eight-segment genome sequences were constructed from the assembled consensus with the best coverage and common sequences of type A influenza viruses at the 5′ end (12 nucleotides (nt)) and 3′ end (13 nt) of the genome segments. The reads were mapped again to the tentative complete genome sequence to assemble Av1955 consensus sequences. The assembled sequences covered 97.4% to 99.5% of the complete genome-segments mostly lacking the 5′ and 3′ consensus sequences. The sequences of the eight genome-segments were submitted to the GISAID database with isolate ID: EPI_ISL_13690275.

Genetic analysis
BLAST analysis in the GISAID EpiFlu database (https://gisaid.org) on 10 August 2022 identified viruses with the highest identity to the nucleotide sequence of one or more gene segments of Av1955.
The genetic information processing software Genetyx v14 (Genetyx Co., Tokyo, Japan) generated the phylogenetic trees using the RAxML with 100 bootstrap replicates. The amino acid sequences of the viral proteins were decoded from the genome nucleotide sequences and analyzed for nonsynonymous mutations. The putative host adaptation mutations were analyzed according to the evaluation of phenotypic markers described by Mertens et al. (2013). We noted the accession number of each genome segment of the viruses in Table S1.

HA and NA subtypes of Av1955
The BLAST analysis revealed that the HA sequence of Av1955 was most close to Spg119 of H5N1 (the sharing identity, 97.7%) and the NA to Av1210 of H5N1 (97.3%) among the viruses in the GISAD database (Table 1). The HA and NA segments were most close to Av154 of H5N1 Eurasian lineage (HA clade 2.3.2.1c) (95.1% and 96.2%, respectively) among the three isolates (Av39, Av240, and Av154) during 2013-2014 in East Java (Table 1). In the phylogenies, the HA and NA segments are on the same branch with Av154 ( Fig. 1). These results clearly indicate that Av1955 is a virus of the H5N1 Eurasian lineage (HA clade 2.3.2.1c).

Genesis of Av1955
The BLAST analysis revealed that the genome sequence of Av1955 was most close to Spg119 for the PB2 segment (the sharing identity, 98.0%), HA (97.7%), NP (98.3%), and NS (98.5%); to Av1534 for the PB1 (97.9%) and PA (98.1%); to Av1210 for the NA (97.3%); and to Eg20616 for the M (98.9%) among the viruses in the GISAD database (Table 1). These results indicated that Av1955 acquired the PB2, HA, NP, and NS segments from an Spg119-like virus; the PB1 and PA from an Av1534-like virus; the NA from an Av1210-like virus; and the M from an Eg2061-like virus. These four putative donors of the segments are H5N1 Eurasian lineage since all HA and NA closely cluster with Av154 of H5N1 Eurasian lineage in the phylogenies of HA and NA (Fig. 1). Av1955 was most close to Av39 for the PB2 (93.2%); to Av154 for the PB1 (95.9%), PA (96.0%), HA (95.2%), NP (97.0%), NA (96.2%), and NS (97.2%); and Av240 for the M (98.1%) among the three isolates in East Java during 2013-2014. In the phylogenies, the PB2 of Av1955 closely clusters with Spg119 and Av39; all of the PB1, PA, HA, NP, and NA cluster with Eg20616, Av1210, Spg119, Av1534, and Av154; the M with Eg20616, Spg119, and Av154; and the NS with Av1210, Spg119, Av1534, and Av154.
The PB2, M, and NS of Eg20616 closely cluster with Av240, while the other six segments with Av154, indicating that Eg206166 is a reassortant between Av240 and Av154; Eg20616 acquired the PB2, M and NS segments from an Av240-like virus, and the other six segments from an Av154-like virus.
The PB2 and M of Spg119 closely cluster with Av39 and Av240, respectively, while the other six segments with Av154, indicating that Spg119 is a triple reassortant between Av39, Av240, and Av154; Spg119 acquired the PB2 segment from an Av39-like virus, the M segment from an Av240-like virus, and the other six segments from an Av154-like virus. From the described results, we proposed a model of the genesis of Av1955 by multi-steps of inter-and intra-subtype reassortment, as illustrated in Fig. 2. Characteristic amino acid sequences of Av1955 Table 2 summarizes the findings of an analysis of Av1955 amino acid sequences of the receptor binding, cleavage, and glycosylation sites in HA, deletions in NA and NS1, truncation of PB1-F2, and amantadine and rimantadine resistant mutations in M2. At the receptor binding region of the HA protein, the amino acid sequences were E-186, G-221, Q-222, and G-224 (H5 numbering), which are consistent with the avian type specifically binding 2,3-linked sialic acid receptor. There were no mutations in this region among the seven reference viruses. The amino acid sequence at the cleavage site of the HA protein was PQRE-RRRKR for Av1955, Av154, and the four donor strains and was PQRESRRKKR for Av240; all of these possessed five consecutive basic amino acid residues as typical highly pathogenic avian influenza A viruses. Av39 had PEKQT----R at the site, only one basic amino acid residue R, as specific low pathogenic viruses. A glycosylation site, NST, was present at 154-156 in Av240, while the particular sequence was absent in seven other viruses, including Av1955 as a result of substitution(s) to GST for Av39; DNA for Av154; NNA for Eg20616, Av1210, Spg119, and Av1534; and CNA for Av1955. There were deletions in NA at 49-68 and NS at 80-84 with Av1955 and all reference viruses except Av39. The Av1955 PB1-F2 was truncated at 25, while that of the reference viruses, except for Av39 and Av240, was truncated at 57. Av39 and Av240 possessed 87 open reading frames for PB1-F2, almost full of 90. The M2 of Av1955 was a resistant type to amantadine and rimantadine.  (4). We analyzed the presence or absence of these mutations in Av1955 and seven reference viruses. Table 3 summarizes the results showing the substitutions positive for at least one of the eight viruses. There were no marker mutations in NP and NA proteins. Av1955 had 11 markers in PB2, while the donor of Spg119 had 10. The increase indicates that the virus acquired one marker mutation in the PB2 (R318K) by a substitution mutation. A marker increase also occurred with NS1 (seven to eight by P215T). The markers of the donor segments were the same number as the recipient, Av1955, with PB1 (eight), PA (three), HA (five), M1 (3), M2 (zero), and NS2 (one). The markers in the transferred proteins were the maximum among the four donor viruses except for the HA. The HA donor, Spg119, had five marker mutations in HA, while Av1210 and Av1534 Multiple basic amino acid

DISCUSSION
Sequencing and phylogenetic analysis (Fig. 1)  The genesis of Av1955 might involve two steps of genome-segment reassortment (Fig. 2). It is worth noting that the M genome segment of the Indonesian lineage has remained in Subtotal There were no mammalian adaptation mutations in the receptor binding region of the HA protein among the seven reference viruses and Av1955. Their amino acid sequences were E-186, G-221, Q-222, and G-224 (H5 numbering), consistent with the avian type specifically binding to a 2,3-linked sialic acid receptor on the cell surface of avian hosts. If the mutations occur, viruses are unable to infect avian hosts. The HA protein of Av1955 contains an HPAI H5N1-type cleavage site sequence with multiple basic amino acid residues (PQRERRRKR). In contrast, the virus was isolated from a healthy chicken suggesting its low pathogenicity nature. All of the H5N1 isolates in East Java, Indonesia, since 2013 were from sick poultry except a few reassortants possessing a PB2 genome segment derived from avirulent subtypes like H3N6; Spg119, A/chicken/East Java/Spg119/ 2018 (H5N1), was one of the exceptions isolated from a healthy chicken (K. Rahardjo, 2019, unpublished data). The virus obtained its PB2 genome from an H3N6 virus, as shown in Fig. 2. Av1955 lost a glycosylation site at 154-156 on the HA protein; the loss has been reported to increase pathogenicity in mice (Suguitan et al., 2012). Av1955 has deletions at 49-68 on the NA and 80-84 on the NS1, both of which have been shown to increase virulence in mice (Matsuoka et al., 2009;Zhou et al., 2009;Long et al., 2008). The Av1955 PB1-F2 truncated at 25 of the total length, 90. The deletion of the main body has been related to increased virulence in mammals (Mettier et al., 2021). Av1955 has amino acid residues of A-27 and N-31 of the M2 protein, which are known to render the phenotype of resistance to amantadine and rimantadine (Belshe et al., 1988;Lan et al., 2010;Pinto, Holsinger & Lamb, 1992 (Dharmayanti, Ibrahim & Soebandrio, 2010). Since amantadine had been subscribed to influenza patients but never used for poultry in Indonesia, the avian viruses possessing the M segment with resistant mutations might come from human-to-avian transmission.
Av1955 possesses 39 mammalian adaptation marker mutations out of 149 identified by Mertens et al. (2013), while the four viruses of the genome-segment donor have 34-36 and the three ancestral viruses isolated during 2013-2014 have 27-34. The virus has increased the number of the markers by substitution mutations and intra-and inter-subtype reassortment gathering gene segments possessing the most abundant maker mutations among circulating viruses. The increasing mammalian adaptation mutation in avian hosts suggests that they might be adaptive to infection in mammalian and avian hosts. It highlights the importance of genomic surveillance and adequate control measures for H5N1 infection in live poultry markets. In contrast, we have not identified the remaining 110 mammalian adaptation marker mutations in the eight avian viruses analyzed in this study. One of the most studied mutations, PB2 E627K, has been known to release the restriction of PB2 activities in replication in mammalian cells (Subbarao, London & Murphy, 1993), including human volunteers (Clements et al., 1992). It is a plausible explanation that this kind of mammalian adaptation mutation arises only in the mammalian host after the transmission.

CONCLUSIONS
Av1955 is a virus of the H5N1 Eurasian lineage (HA clade 2.3.2.1c). The virus has increased mammalian adaptation markers by substitution mutation and intra-and inter-subtype reassortment, gathering gene segments possessing the most abundant maker mutations among circulating viruses. The increasing mammalian adaptation mutation in avian hosts suggests that they might be adaptive to infection in mammalian and avian hosts. It highlights the importance of genomic surveillance and adequate control measures for H5N1 infection in live poultry markets.

Data Availability
The following information was supplied regarding data availability: The raw measurements are available in the Supplemental Files.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.14917#supplemental-information.